CAGEF_services_slide.png

Advanced Graphics and Data Visualization in R

Lecture 03: ggplot2 and adding those finishing touches

0.1.0 An overview of Advanced Graphics and Data Visualization in R

"Advanced Graphics and Data Visualization in R" is brought to you by the Centre for the Analysis of Genome Evolution & Function's (CAGEF) bioinformatics training initiative. This CSB1021 was developed to enhance the skills of students with basic backgrounds in R by focusing on available philosophies, methods, and packages for plotting scientific data. While the datasets and examples used in this course will be centred on SARS-CoV-2 epidemiological and genomic data, the lessons learned herein will be broadly applicable.

This lesson is the third in a 6-part series. The aim for the end of this series is for students to recognize how to import, format, and display data based on their intended message and audience. The format and style of these visualizations will help to identify and convey the key message(s) from their experimental data.

The structure of the class is a code-along style in Jupyter notebooks. At the start of each lecture, skeleton versions of the lecture will be provided for use on the University of Toronto Jupyter Hub so students can program along with the instructor.


0.2.0 Lecture objectives

Last week we did a deep dive on some of the more popular and broadly applicable visualizations for conveying basic ideas about your data. This week will focus on tidying up your visualizations and adding those extra finishing touches that will help polish them off. Adding, removing, altering graphs. Getting these little details correct help you to avoid alterations with additional software outside of R.

At the end of this lecture you will have covered the following topics

  1. Altering and reproducing themes.
  2. Setting and changing the content of titles, axes, text, and legends.
  3. Annotating with text, and highlighting.
  4. Altering your plot with new geoms, as well as data/axis/text manipulations.
  5. Arranging plots together in the same figure.

0.3.0 A legend for text format in Jupyter markdown

grey background - a package, function, code, command or directory. Backticks are also use for in-line code.
italics - an important term or concept or an individual file or folder
bold - heading or a term that is being defined
blue text - named or unnamed hyperlink


0.4.0 Data used in this lesson

Today's datasets will focus on a number of datasets we've used in our previous lectures.

0.4.1 Dataset 1: Lecture03.RData

This data file contains 4 objects:

  1. covid_phu_long.df: COVID-19 daily cases values across Ontario public health units seen in lecture 01.
  1. covid_phu_window.df: sliding window data generated from covid_phu_long.df based on a 14-day rolling mean.
  1. phu_by_total_cases_desc: a list of Ontario PHUs in descending order by caseload
  1. covid_demographics_total.df: age group demographics in a long-format that we generated in lecture 02.

0.5.0 Packages used in this lesson

repr- a package useful for altering some of the attributes of objects related to the R kernel.

tidyverse which has a number of packages including dplyr, tidyr, stringr, forcats and ggplot2

viridis helps to create color-blind palettes for our data visualizations

lubridate and zoo are helper packages used for working with date formats in R

ggthemes, directlabels, ggforce, ggbeeswarm, gghighlight, and ggExtra will provide us new geoms and methods for plotting or altering how our plots look.

ggpubr for arranging our plots.


1.0.0 Present your data in its best format and form

Last week in lecture 2 we spent our time highlighting various types of plots and there variants while discerning the proper circumstances of their use. Now that we know which plots to use and when to use them, we can focus on how to clean up your visualizations so they can be their "best self".

Through both lectures and assignments we have already glimpsed at some of the commands and layers we can use to improve upon our graphs whether that is by choosing colour, titles, or legend information. Today we'll explore those options more deeply so you don't have to spend days trying to get your visualizations to look perfect. We'll revisit some old plots and build them up from basics.

Let's start with our PHU caseload data from lecture 1. We'll load it from a .RData file along with some other helpful objects.

From our above plot, we can immediately see that we have issues that need remedying:

  1. The overall font size of the plot is small, and I have old eyes.
  2. The legend title is quite large and based specifically on the aes() assignment used.
  3. Our axes need to be updated and we could use a title too.

1.1.0 Control the display of all non-data elements with theme()

Although we haven't directly discussed themes yet, we have seen it appearing here and there in our individual plots. The influence of themes sets and controls the presentation of titles, labels, text, text, background, legends, etc. You don't directly change the actual information presented in these elements.

Calls to theme() generally take the form of theme(element.component.sub-component = element_*(parameter = value))

Some basic elements include line, rect, text, title, and aspect.ratio. Altering these elements in theme() will alter all elements of their kind (ie all lines, rectangles, text etc.). Alternatively specific element components can be altered more directly. The following table lists most of the possible theme elements and components. They can be as specific as axis.title.x.top. More detailed descriptions can be found here.

Element Description Components Sub-components Other
axis x and y axis elements title, text, ticks, line x, y, length top, bottom, left, right
legend all legend elements background, margin, spacing, key, text, title, position, direction, justification, box x, y, size, height, width, align, just, spacing
panel background plotting area background, border, spacing, grid x, y, major, minor
plot entire plot background, title, subtitle, caption, tax, margin position
strip facet labels background, placement, text, switch x, y, text, pad grid, wrap



You update or set your individual elements using the element_*() functions. Within each element you can typically control aesthetics like fill, colour/color etc. Below is a summary of the elements of concern and their parameters. Specific elements_*() will correspond with the above theme elements.

element call description fill colour size linetype lineend arrow family face hjust vjust angle lineheight margin
element_line() formatting of lines $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$
element_text() formatting of text $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$
element_rect() borders and background $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$
element_blank() draws nothing, and assigns no space


inherit.blank is an additional parameter you can use in these functions. If set to TRUE the existence of a blank element among the parents of this element will cause this element to be blank as well. For example axis.title is the parent of axis.title.x. It's somewhat of a conditional to avoid overriding previous aesthetics assignments that set such elements to element.blank().


1.1.1 Move your legend(s) using the legend.position option

When we are looking to move our legends to different positions, there are 2 areas to consider. The first is the plot space itself which surrounds the data panel (where our data is plotted). The legend.position parameter can take in two types of values. The first is a set of characters: top, bottom, left, and right which relate to the plot area.

Let's start with altering our legend position. It's taking up quite a bit of space on the side. We'll worry about the label issues later but for now, let's move it to the bottom of the plot. At the same time, let's increase our overall text size for the plot.


1.1.1.2 Move a legend to within your plot space

Instead of moving the legend to the bottom of our plot, let's use the empty space in the top left corner of the panel instead by accessing the coordinate system ([0:1], [0:1]) that represents the relative positioning of elements within the panel. This system, follows a c(x, y) setup that matches with plotting space with (0,0) representing the lower left corner.

Before we move the legend onto our panel, however, we also have to remember where the legend itself is anchoring when we move it. Are we asking to put the bottom-right corner of the legend into the top-left corner of the plot? Or do we want to match the legend anchor so that the top-left corners are aligned?

Use the legend.justification parameter to properly set this property when moving your legend. It uses the same two-point coordinate concept that we'll use for legend.position.


1.1.2 Update the background panel and lines

There are a few more things we can do to the plot for now that include updating the background panel to get rid of the grey colour and maybe darkening our axis tick lines and axis lines themselves.

  1. We'll use the panel.background parameter which expects an element_rect() to define it's properties.
  2. panel.grid.* gives us access to the background axes lines using element_line()
  3. We'll work with axis.* elements to to update their look a bit too.
  4. Let's spice up the plot a little bit by setting the overally background colour.

1.2.0 Use premade themes from ggplot2

In our above example we made alterations to the theme that affected background colour and axis lines. While some of you may lean on the more artistic side you can also use premade themes from both the ggplot2 package and an additional packages named ggthemes. Below you'll find a list of the themes from ggplot2.

Theme Description
theme_gray() Grey background colour, white grid lines.
theme_bw() White background colour, grey grid lines.
theme_linedraw() White background colour, black lines of various widths
theme_light() White background colour, grey lines of various widths
theme_dark() Dark background colour, grey lines of various widths
theme_minimal() No background annotations, grey lines
theme_classic() White background, x/y axis lines, no grid lines
theme_void() A copmletely empty themes, white background, no axis or grid lines

If you find a theme that you mostly like, you can use that as a base to your graph before making additional theme() alterations. Let's try a few of these out.


1.3.0 ggthemes mimics visual styles from multiple sources

If you are feeling a little more daring with your choices, you can turn to the ggthemes packages to mimic styles from a number of publications such as the Economist, and Wall Street Journal. You can look up a list of the various themes at https://github.com/jrnold/ggthemes.

Like the themes provided by ggplot, you can also make edits to these themes within your scripts.

Two additional package options with different colour palettes and shapes are ggthemr and ggsci.


2.0.0 Text content can be updated through a number of layers

Now that we have played around with how to reposition legends, and other elements of your plot, we can discuss how to change the actual text content of your plot. Many times we want to relabel axes or legends, even legend labels. There are a number of layers we can work through but we'll present some of the simplest ways to accomplish this.


2.1.0 Label titles and axes individually or with the labs() command

Up to this point, we've seen the use of different commands to alter the labels and titles like xlab(), ylab(), and ggtitle().

You can also access multiple options within a single command labs() which accepts the following parameters:

Let's relabel our plot axis and titles to be more accurate. For now we'll drop the Stata theme and go with our own alteration of theme_minimal(). We'll also include a caption in the bottom right to explain how we display the 14-day rolling mean.

Note: a quick way of adding space to your titles, is to include the \n character which produces a carriage return.


2.2.0 Relabel axis ticks, and legend labels with the labels parameter

The scale_*() functions can also be used to set the title, limits, breaks, and labels along your axes. Some of these parameters are redundant and can override other ggplot2 layer commands, depending on the order you have included them.

Parameter Equivalent ggplot layer command
name xlab(), ylab(), lab(x), lab(y)
limits xlim(), ylim()
break Determine when axis tick marks are generated
labels Rename the labels present at axis tick marks

2.2.1 Relabel discrete axes and legend labels

For various reasons, you may have categorical or grouped data with unusual names. It may be convenient to code you data this way but letting ggplot2 assign these to your axes or labels may not be suitable. Instead, you can manually rename them using the labels parameter with your various scale_*_discrete() layers. You have to be sure, when manually naming these, to supply a vector with the correct number of arguments to match the number of levels in your categories or groups.

Let's start by relabeling our x-axis show us our dates by month and at the same time we set a limit to show us data starting in July of 2020. We've done this before so it should be easy.


2.2.2 Relabel continuous axis ticks by altering limits and breaks

You can use a similar trick to relabel with continuous data by setting the frequency of tick marks in a scale_*_continuous() layer. There are a number of ways to generate the actual list of tick marks but a character or numeric vector must be assigned to the breaks parameter. You can additionally relabel these tick marks but the vector of labels supplied must match the length of your breaks.

Let's break our y-axis into major tick-marks of every 200 cases by altering scale_y_continuous(). While we're playing with scale, let's update our colour scheme as well.


2.3.0 Colour palettes!

Up to this point, we've danced around the idea of colour in our lectures and assignments. For those of you that aren't familiar with your colour choices, here is a quick breakdown of colour palettes.

A common thing to want to do is to change colours from ggplot2's default rainbow palette. There are many reasons to change a colour palette including

When we are talking about colour palettes and their purpose, there are 3 main types.

2.3.0.1 Use sequential colour palettes to display low to high values

Sequential - implies an order to your data - i.e. light to dark implies low values to high values. There are helpful when working with continuous data scales of increasing value e.g. heatmaps.


2.3.0.2 Use diverging colour palettes to highlight the middle and extremes of a distribution

Diverging - low and high values are extremes, and the middle values are important. This palette will goes from light to dark, middle to outsides with 3 colours mainly used.


2.3.0.3 Use qualitative colour palettes for categorical data

Qualitative - there is no quantitative relationship between colours. This is usually used for categorical data when you want each category to be visualized distinctly.


2.3.1 Add a colour palette to a plot like a layer

Let's test one of the RColorBrewer palettes out on our data. We'll add it as a layer to phu_window.plot using scale_colour_brewer() to override the colour mappings defined in the aes() layer of the plot. Some parameters we can keep in mind:

Note that colour palettes are not vector recycled when plotting in ggplot. This means if you don't supply enough colours to match your groups, then unassigned groups will simply be cut off or not displayed.

More information on palette order and other parameters can be found here


2.3.2 You can always pick your own colours!

You can always choose a vector of your own colors using this 'R color cheatsheet' (https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf).

Names of colours as well as hex colour codes are accepted. You can supply a manual list using the scale_*_manual() command.


2.3.3 Colour-blind friendly palettes can be found in the viridis package

The viridis package also has some nice color palettes (https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). These colour packages are diverging palettes meant to help highlight true colour change across continuous scales. You've seen it come up a few times in our data and these palettes do well for small categorical sets but being to blend as we our categories increase in size.

The main calls we can use follow the format scale_*_viridis_c/d/b() where the "c/d/b" represents continuous/discrete/binned data and the types of additional arguments that can be passed on to augment the call. There are some additional parameters that can be used to set the colours when called:


3.0.0 Annotating your plots

After preparing your visualization you may considering adding extra annotations. These are usually layers that don't affect the aesthetics or data of your visualization but depending on how you add them and the package you are using this isn't strictly true. For the most part, however, let's consider your annotations as separate from your plot.

3.1.0 annotate() plots with shapes, text, and arrows.

Sometimes you need to add some additional text, or shapes to your graph that aren't necessarily a part of the data itself. in other words you would like to annotate your plot. To accomplish this you can use the annotate() function which will essentially add geoms to your plot. While these annotations can affect the scale of your plot if required to show your annotation(s), they won't affect the legends nor be treated as actual data - just an overlay to your plot.

The annotate() geom has the following characteristics

Parameter Description
geom Can be any number of possible values including "text", "rect", and "segment"
xmin, xmax, ymin, ymax, xend, yend Positioning aesthetics where at least one of these must be defined.
... Other aesthetics arguments that can be passed along like color = "red"
na.rm If FALSE, missing values are removed with a warning otherwise they are silently removed

Up to this point we've already added some annotations to this plot in previous lectures. Today we'll try adding a few bits of text and lines segments with arrows instead of boxes.


3.2.0 Data labeling with annotations

Unlike the annotations we just discussed, you may wish to directly label or output information based on your data from the plot. This can be in the form of error bars, or data labels. Sometimes you may want to include your sample size or further highlight your outliers.

3.2.1 Label data directly with the directlabels package

If for some reason you needed to label your plot data directly, the geom_dl() layer from the directlabels packages can be quite useful. The package will replace your colour legends with direct labeling instead since this can (sometimes) be a little cleaner and less confusing. Parameters you should set when working with geom_dl() are:

Note that adding direct labels this way, however, will not remove the corresponding legend from the plot. It will simply add extra geoms to your plot.


3.2.2 Label data using the direct.label() feature

For simplicity, you can also call on direct.label(), which will automatically remove the associated legend from your plot. You can use it by providing the following parameters:


3.3.0 Emphasize your data groups with gghighlight()

You may find yourself in an instance where you have too many data groups to present (ie 34 PHUs) but would still like the audience to get an overview of your dataset while focusing on a few items. As we have done in the past, you could break groups out using facet_*() but that isn't ideal. We have also filtered for the top PHUs from a previously generated list but then we get no sense of the other PHUs at all.

Instead you can use the gghighlight() layer from the package of the same name. Some helpful parameters from this function include:

Let's plot all of our PHU data onto the graph and only highlight the top 4 PHUs as before. We'll have to do some extra fiddling to make it work just right.


4.0.0 Annotations and theme alterations through other layers

4.1.0 Annotate error bars with geom_*()

When working with bar or line plots where you may have generated information such as a mean with standard deviation, you can plot that information with geom_errorbar(). Unlike annotations from above this is a specific geom and is treated by the plot like any other geom_*() we've encountered. Under it's aes() argument you can specify the ymin and ymax values or data sources. If you already have pregenerated columns for these values, you can use them directly or you can calculate them on the fly if you have just a mean and standard deviation.

There are alternative formats of the geom_errorbar() as well:

geom Description
geom_crossbar() A hollow box with the middle indicated by a horizonal line.
geom_errorbarh() Horizontal versions of the errorbar.
geom_linerange() Draws an interval using a single vertical line.
geom_pointrange() Same as a linerange except an additional point is plotted in the middle of the range.

Let's recreate one of our plots from lecture 2 using summary data and some of these new geoms!

Now that we've gone and built ourselves an extremely strange plot, (remember, this is just an example) there are a few things we can fix/play with.

  1. You'll note that we only get 6 shapes plotted for our age groups.
  2. Our legend titles should be corrected
  3. We'll move our legends around in order

4.2.0 Alter your legends with the guide parameter or guides() layer

Normally you can let ggplot2 take the wheel and automatically generate guides for you. Whenever you set colour/fill/linetype etc in your aesthetics, this will generate a legend. When the groups are mapped in the same way between different aesthetics, the legends may be combined.

There may be instances, however, when you need to adjust your legend or get rid of it all together. This could range from titles, to combining your guides across different aesthetics commands. There are a number of ways to achieve the same result when working with guides and we'll go through a number of examples. First, however, we should discuss the types of legends:

guide short call Description
guide_legend() legend The base prototype of the legend which integrates how geoms are mapped into values.
guide_bins() bins A binned version of legends which places ticks between keys and has its own small axis
guide_colourbar() colourbar For mapping continous colour/fill scales from using scale_fill_*() and scale_colour_*().
guide_coloursteps() coloursteps A version of guide_colourbar() except for binned colour and fill scales rather than gradients.

Within each of the guide types, you can update parameters about text within the legend.

Component Sub-components
title name, position, theme, hjust, vjust
label name, position, theme, hjust, vjust
key width, height
order you can determine the order of the guide amongst others using integers [1:99]. 0 sets order by an algorithm
other direction of guide, number of rows/cols

So where can you use these methods?

Within each scale_*() you declare you can set the parameter guide to one of the above guide types. To exclude a legend for that particular type, set the value to FALSE.

Alternatively, you can use the guides() call to set multiple guides at once using the scale types as parameters ie colour, size, shape.

4.2.1 Force an override to the plot aesthetics to adjust the legend

Before we leave the guides() section, we should update our plot one last time. When you are working with so many shapes, sometimes, they can show up a little smaller than you want. You may wish to increase their size on the plot but that may disproportionately increase their size on the legend. You can adjust or hold the key objects to a specific size using the override.aes parameter. We'll be applying this within our guides.


4.3.0 The ggforce package annotates with simple geom_mark_*() options

The ggforce() package brings helpful geoms and functions to ggplot2 that can quickly annotate groups of data within your plots. These layers work with ggplot2 like other geom_*() layers so you can add them into your plots quite simply. These objects can also accept aesthetics parameters (including the ability to filter groups) amongst many other theme-esque parameters and are added in an automated fashion. More information can be found here

geom Description
geom_mark_circle() Add circles to all of your data groups
geom_mark_rect() Add rounded-corner rectangles to your data groups
geom_mark_ellipse() Add ellipses to all of your data groups
geom_mark_hull() Add a more tightly-fitted shape/blob (aka hull) around your data groups

You can also add custom shapes, specifying their type, location, etc and extensions to the facet_*() group of layers allow you to facet by different columns, zoom in on part of a graph as a facet, and split facets into multiple plots.

Let's add some ellipses to our plot and exchange our geom_line() for a smoother geom_bspline().


4.4.0 ggbeeswarm takes your plotting your points up to the next level

As you noticed from last week, we used a couple of different methods for plotting our points onto our boxplots and violin plots. Those geoms are native to the ggplot2 package with some parameters that allow for a more "random" distribution of your data points within a provided area. The goal of the ggbeeswarm package is to generate points that will not overlap but they can also be used to simultaneously simulate the kernel density of your data. There are two geoms supplied that work with the ggplot2 package to accomplish this.

geom_beeswarm() has a number of parameters can be used to set their aes() mappings but also how the points are laid out.

geom_quasirandom() works similarly to the beeswarm function with emphasis on additional method of how the points are plotted.


4.5.0 Working with special characters and symbols in your text using expression()

Working in biological science, you will often find yourself wanting to italicize species names or add special characters when name proteins etc. This is not a feat easily accomplished using the options provided by ggplot2. Instead you can generate string objects with the required font-changes or symbols and then provide these to objects to your plot.

There are two routes to accomplish this kind of formatting. We'll explore the first, expression() which makes an expression object. The expression() function interprets a series of strings and characters into a mathematically-formatted expression. When supplied as an argument, this object is interpreted as a mathematical expression and the output formated based on a TeX-like set of rules that parse through the syntax. Within this function, there are a number of parameters that can seem like functions but are implemented within expression() rather than using the base R functions - so don't expect the same kind of behaviours. Here is a non-exhaustive list of potential situations you may encounter.

Symbol Description
+, -, %*%, %/%, %+-% basic mathematical symbols for +, -, *, /, and $\pm$
paste(x,y,z), x*y*z juxtapose x, y, and z without any separators
sqrt(x) square root of x
sqrt(x, y) the yth root of x
plain(x), bold(x), italic(x), bolditalic(x), symbol(x), underline() draw x in normal, bold, italic, bolditalic, symbol and underlined font
list(x, y, z) output a comma-separated list of x, y, z
hat(x), tilde(x), dot(x), bar(x) add symbols above x
alpha to omega, Alpha to Omega Greek symbols in lower and upper case
infinity the infinity symbol
x ~ y, x ~~ y put a space between x and y or put extra space between them
phantom(0) leave a gap for "0" without drawing it
frac(x, y), over(x, y) output x over y
atop (x, y) output x over y without any bar

Note from above, to build your expressions from multiple parts, you should use the * or paste() operators from within expression().


4.6.0 Format and interpret variables using bquote()

Unlike the expression() function, using bquote() allows you to reference information which may be stored in variables so that you can add these instead of explicitly including the words you want. When thinking about using bquote() you can break your math notation into four forms of syntax. These sections or forms can be joined with the ~ symbol.

Class of text Syntax Description
Strings "my text" ~ Words and non-mathematical text that you want to print as-is
Math Expressions infinity, alpha, frac(x, y) Unquoted and essentially the same kinds of symbols useable by ?plotmath and expression().
Numbers 1, 42, 900000 Use unquoted when part of math notation.
Variables .(variable) Used to pass in a string or numeric into your equation.

Many R-enthusiasts prefer this form of generating expressions for it's flexibility to build whatever you want.


4.7.0 Marginal plots to visualize relationships and distributions from ggExtra

Marginal plots are a very specialized plot type from the ggExtra package which combines scatterplot data with distribution data in the margins. The main plot panel has your two variables along the x and y axis. Secondary plots are made on the opposite margins and can be in the form of distribution-based object ie., histograms, boxplots, etc.

The workhorse of this package is the ggMarginal() function which takes as input paramaters:

Let's re-imagine our PHU age group data now as a scatterplot with marginal boxplots. While this won't be the clearest visualization of this kind of data it will help to demonstrate how to generate marginal plots with your data.


5.0.0 Taking it up a notch

There are many fantastic R packages to analyze and visualize your data. As a group, we are likely working in a variety of specialized areas. The plots we have made so far today should be useful for data exploration for many different kinds of data. In this final section we are going to learn how to arrange multiple plots per page for those publication-ready figures.

5.1.0 Multiple plots on one page (ie. for publication images) with ggarrange()

There are a variety of methods to mix multiple graphs on the same page, however ggplot2 does not work well with all of them. I am going to work with a package base that uses gridExtra(which allows us to arrange plots) but works well with ggplot2 called ggpubr (which allows us to align the axes of our plots). For a demonstration, we are going to take 3 plots that we made earlier (a boxplot, a histogram, and a dot plot), save them as objects, and then arrange and align them in the same figure. (http://www.sthda.com/english/rpkgs/ggpubr/)

ggarrange() is a function that takes your plots, their labels, and how you would like your plots arranged in rows and columns. To start let's put our PHU case data (phu_cases.plot) above our PHU age group data (phu_age.plot). If you picture each plot as a square in a grid, we need one column (one for each plot, ncol = 1) and two rows (since they are stacked, nrow = 2).


5.2.0 Arrange plots within plots

Next we will add in the boxplot by nesting a ggarrange() call within another.

Imagine a square with 4 boxes.

  1. We are going to place our line graph across the top row (top 2 boxes)
  2. We'll place our age group data in the bottom left box
  3. We'll drop our marginal plot into the bottom right box

To do this, we are arranging 2 rows (one with the line graph and one with the [age group + marginal plot], nrow = 2) and we are arranging 2 columns in the bottom row (one with the age group and one with the marginal plot, ncol = 2).


5.3.0 Small changes can be made with align and font()

Okay, there are a few problems with this arrangement. Spacing aside, our title in plot B has spread over into area C. If you wanted to keep it, you would have to fix up the text in the plot and try again. However, we can treat the plots much like their own data and keep altering them with the + symbol. That mean for a quick fix, we could just remove the title altogether. Do you remember how to access the plot title?

Problem 2: the x-axes in our B/C plots don't line up well. Would it look better if they did? If y-axis lines or x-axis lines are not aligned, this can be fixed with a call to align = "v" or align="h".

Problem 3: the font labels denoting each plot look a little small overall. We can change this aspect with the font.labels parameter.

If you wanted to make sure all axis titles are the same size you can specify these small changes using font(). You can try to access these attributes through simple names like "axis.title", and "legend.title" ie font("axis.title", size=9) but you need to set each graph and each attribute separately.

Let's drop our plot B title, and try to shore up the axes between B and C. Unfortunately we may be stopped by the crowded spacing at the bottom of these plots.


6.0.0 Class summary

Today we have dug deep into altering and playing with our plots to help get them to that extra level. Although there is far more to explore, this should cover most of your needs when it comes to cleaning up your plots. Looking a little bit ahead at this week's assignment, you will look at canada-wide vaccination data.

You now have the tools to create plots like this:

vaccine.cumulative.facet.png

and this:

vaccine.boxplot.png

I can't wait to revisit these graphs in a few months!


7.0.0 Weekly assignment

This week's assignment will be found under the current lecture folder under the "assignment" subfolder. It will include a Jupyter notebook that you will use to produce the code and answers for this week's assignment. Please provide answers in markdown or code cells that immediately follow each question section.

Assignment breakdown
Code 50% - Does it follow best practices?
- Does it make good use of available packages?
- Was data prepared properly
Answers and Output 50% - Is output based on the correct dataset?
- Are groupings appropriate
- Are correct titles/axes/legends correct?
- Is interpretation of the graphs correct?

Since coding styles and solutions can differ, students are encouraged to use best practices. Assignments may be rewarded for well-coded or elegant solutions.

You can save and download the Jupyter notebook in its native format. Submit this file to the the appropriate assignment section by 12pm on the date of our next class: March 25th, 2021.


8.0.0 References

The R Graph Gallery: https://www.r-graph-gallery.com/index.html

Advanced examples of direct labeling with geom_dl(): https://directlabels.r-forge.r-project.org/examples.html

More information about the gghighlight package: https://cran.r-project.org/web/packages/gghighlight/vignettes/gghighlight.html

Using expression(): https://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/plotmath.html

Using bquote(): https://www.r-bloggers.com/2018/03/math-notation-for-r-plot-titles-expression-and-bquote/

More options for ggarrange(): https://rpkgs.datanovia.com/ggpubr/reference/ggarrange.html